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Abstract 

We consider the design of systems for sequential decentralized detection, a problem that entails 
several interdependent choices: the choice of a stopping rule (specifying the sample size), a global 
decision function (a choice between two competing hypotheses), and a set of quantization rules (the local 
decisions on the basis of which the global decision is made). This paper addresses an open problem 
of whether in the Bayesian formulation of sequential decentralized detection, optimal local decision 
functions can be found within the class of stationary rules. We develop an asymptotic approximation to 
the optimal cost of stationary quantization rules and exploit this approximation to show that stationary 
quantizers are not optimal in a broad class of settings. We also consider the class of blockwise stationary 
quantizers, and show that asymptotically optimal quantizers are likelihood-based threshold rules0 

Keywords: decentralized detection; decision-making under constraints; experimental design; hypothesis 
testing; quantizer design; sequential detection. 

1 Introduction 

Detection is a classical discrimination or hypothesis-testing problem, in which observations {Xi,X2, ■ ■ ■} 
are assumed to be drawn i.i.d. from the (multivariate) conditional distribution ¥{- \ H ) and the goal is to 
infer the value of the random variable H, which takes values in {0, 1}. In a typical engineering application, 
the case {H = 1} represents the presence of some target to be detected, whereas {H = 0} represents its 
absence. Placing this problem in a communication-theoretic context, a decentralized detection problem is 
a hypothesis-testing problem in which the decision-maker is not given access to the raw data points X„, 
but instead must infer H based only on the output of a set of quantization rules or local decision functions, 
say {Un = i'niXn)}, which map the raw data to quantized values. This basic problem of decentralized 
detection has been studied extensively for several decades [17, 19, 6 1; see the overview papers ll20l l23l l3l l5l 
and references therein for more background. Of interest in this paper is the extension to an-online setting: 
more specifically, the sequential decentralized detection problem ||T9l |2T1 [T2l involves a data sequence, 
{Xi,X2, ■ ■ and a corresponding sequence of summary statistics, {C/i, C/2, • ■ •}> determined by a se- 
quence of local decision rules {(^1, 02, . . .}. The goal is to design both the local decision functions and to 
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specify a global decision rule so as to predict H in a manner that optimally trades off accuracy and de- 
lay. In short, the sequential decentralized detection problem is the communication-constrained extension 
of classical formulation of sequential centralized decision-making problems (see, e.g., lEl [I5l [TOj|) to the 
decentralized setting. 

In setting up a general framework for studying sequential decentralized problems, Veeravalli et al. ll22ll 
defined five problems, denoted "Case A" through "Case E," distinguished from one another by the amount of 
information available to the local sensors. In applications such as power-constrained sensor networks, one 
cannot assume that the decision-maker and sensors can communicate over a high-bandwidth channel, nor 
that the sensors have unbounded memory. Most suited to this perspective — and the focus of this paper — is 
Case A, in which the local decisions are of the simplified form i.e., neither local memory nor feed- 

back are assumed to be available. Noting that Case A is not amenable to dynamic programming and hence 
presumably intractable, Veeravalli et al. ll22l suggested restricting the analysis to the class of stationary local 
decision functions; i.e., local decision functions that are independent of n. They conjectured that sta- 
tionary decision functions might actually be optimal in the setting of Case A (given the intuitive symmetry 
and high degree of independence of the problem in this case), even though it is not possible to verify this 
optimality via DP arguments. This conjecture has remained open since it was first posed by VeeravalU et 

al. EH ED. 

The main contribution of this paper is to resolve this question by showing that stationary decision func- 
tions are, in fact, not optimal for decentralized problems of type A. Our argument is based on an asymptotic 
characterization of the optimal Bayesian risk as the cost per sample goes to zero. In this asymptotic regime, 
the optimal cost can be expressed as a simple function of priors and KuUback-Leibler (KL) divergences. 
This characterization allows us to construct counterexamples to the stationarity conjecture, both in an exact 
and an asymptotic setting. In the latter setting, we present a class of problems in which there always exists 
a range of prior probabilities for which stationary strategies, either deterministic or randomized, are subop- 
timal. We note in passing that an intuition for the source of the suboptimality is easily provided — ^it is due 
to the asymmetry of the KL divergence. 

It is well known that optimal quantizers when unrestricted are necessarily likelihood-based threshold 
rules [19,] . Our counterexamples and analysis imply that optimal thresholds are not generally stationary 
(i.e., the threshold may differ from sample to sample). We also provide a partial converse to this result: 
specifically, if we restrict ourselves to stationary (or blockwise stationary) quantizer designs, then there 
exists an optimal design that is a deterministic threshold rule based on the likelihood ratio. We prove this 
result by establishing a quasiconcavity result for the asymptotically optimal cost function. 

It is worth highlighting several limitations in our results. For the suboptimality of stationary quantizers, 
our analysis is applicable only to finite classes of deterministic quantizers and their convex hull of random- 
ized quantizers, and under the assumption that the likelihood ratio of the two hypotheses are bounded from 
both above and below. Such assumptions certainly hold for arbitrary discrete distributions with finite sup- 
port. It remains an open problem to consider more general classes of distributions. For the likelihood-ratio 
characterization result, our proof works only for the (possibly infinite) classes of deterministic quantizers 
with arbitrary output alphabets, as well as for the class of randomized quantizers with binary outputs. We 
conjecture that the same result holds more generally for randomized quantizers with arbitrary output alpha- 
bets. 

The remainder of this paper is organized as follows. We begin in Section [2] with background on the 
Bayesian formulation of sequential detection problems, and Wald's approximation. Section [3] provides a 
simple asymptotic approximation of the optimal cost that underlies our main analysis in Section |4l In 
Section |5l we establish the existence of optimal decision rules that are likelihood-based threshold rules. 
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under the restriction to blockwise stationarity. We conclude with a discussion in Section [6l 



2 Background 

This section provides background on the Bayesian formulation of sequential (centralized) detection prob- 
lems. Of particular use in our subsequent analysis is Wald's approximation of the cost of optimal sequential 
test. 

Let Pq and Pi represent the distribution of X, when conditioned on {H = 0} and {H = 1} respectively. 
Assume that Pq and Pi are absolutely continuous with respect to one another. We use f^{x) and f^{x) 
to denote the respective density functions with respect to some dominating measure (e.g., Lebesgue for 
continuous variables, or counting measure for discrete-valued variables). 

Our focus is the Bayesian formulation of the sequential detection problem |T5^,'2T'|; accordingly, we let 
TT^ = ¥{H = 1) and vr*^ = F{H = 0) denote the prior probabilities of the two hypotheses. Let Xi,X2, . . . 
be a sequence of conditionally i.i.d. realizations of X. A sequential decision rule consists of a stopping 
time N defined with respect to the sigma field cr(Xi, . . . , X^), and a decision function 7 measurable with 
respect to (j{Xi, . . . , Xj\[). The cost function is the expectation of a weighted sum of the sample size N and 
the probability of incorrect decision — namely 

J(iV,7) :=E{cN + I[j{Xi,...,Xn)^H]}, (1) 

where c > is the incremental cost of each sample. The overall goal is to choose the pair {N, 7) so as to 
minimize the expected loss ([T]l- 

It is well known that the optimal solution of the sequential decision problem can be characterized recur- 
sively using dynamic programming (DP) arguments [ 1 , 25 ,fT5]|3. Although useful in classical (centralized) 
sequential detection, the DP approach is not always straightforward to apply to decentralized versions of 
sequential detection ||2T1 . In the remainder of this section, we describe an asymptotic approximation of the 
optimal sequential cost, originally due to Wald (cf. [16]), valid as c ^ 0. To sketch out Wald's approxima- 
tion, we begin by noting the optimal stopping rule for the cost function ^ takes the form 

TV = inf {n > 1 I L„(Xi, . . . , X„) := log ^ {a, b)}, (2) 

i=i J ^^^> 

for some real numbers a < b. Given this stopping rule, the optimal decision function has the form 

10 if Ln < a. 

Consider the two types of error: 

a = Fo{j{LN)^H)=¥o{LN>b) 
P = Fiij{LN)^H)=¥i{LN <a). 

As c — > 0, it can be shown that the optimal choice of a and b satisfies a — > —00, 6 — > 00, and the corre- 
sponding a,f3 satisfy a + /? ^ 0. Ignoring the overshoot of Ln upon the optimal stopping time (i.e.. 



3 



instead assuming L n attains precisely the value a or b) we can express a, b, ¥.N and the cost function J in 
terms of a and f3 as follows |[24l : 

a PS a(a, [3) := log — and b ^ b{a, (3) := log - — — (4) 
1 — a a 

Ko[Ln] ^ {I - a)a + ab and Ei[Ln] ^ {I - /3)b + /3a (5) 
Now define the KuUback-Leibler divergences 



= Ei[log ^j^] = D{f\\n and D° = -Eo[log ^ 



^(riir). (6) 



With a slight abuse of notation, we shall also use D{a, (3) to denote a function in [0, 1]^ M such that: 

L>(a,/3) := a log ^ + (1 - a) log — ^. 

/? 1-/3 

With the above approximations, the cost function J of the decision rule based on envelopes a and b can be 
written as 

J = 7r^Ei(cA^ + I[LAr < a]) +7r°Eo(cA^ + I[LAr > 6]) 

= CTT — + cvr + vr a + vr /3, (7) 

^ o Z?(a,l-/3) i Z)(l-/3,a) „ , 

D*' h TT Q + TT p, (e) 

where the third line follows from Wald's equation |24|. Let J(a, (3) denote the approximation ([8]l of J. 
Let J* denote the cost of an optimal sequential test, i.e., 

r = inf J. (9) 

A useful result due Chemoff Q states that under certain assumption (to be elaborated in the next section), 
J* has the following form: 

J* ^ (^ + ^)clogc-i(l + o(l)). (10) 



3 Characterization of optimal stationary quantizers 

Turning now to the decentralized setting, the primary challenge lies in the design of the quantization rules 
applied to data X„. When X„ is univariate, a deterministic quantization rule 0„ is a function that maps 
X to the discrete space U = {0, . . . , A' — 1} for some natural number K. For multivariate X„ with d 
dimensions arising in the multiple sensor setting, a deterministic quantizer 0„ is defined as a mapping from 
the d-dimensional product space X toU = {0,...,K — 1}°'. In the decentralized problem defined as Case 
A by Veeravalli et al. flT], the function 0„ is composed of d separate quantizer functions, one each for each 
dimension. A randomized quantizer is obtained by placing a distribution over the space of deterministic 
quantizers. 

Any fixed set of quantization rules 0„ yields a sequence of compressed data C/„ = i?i>„(X„), to which the 
classical theory can be applied. We are thus interested in choosing quantization rules , 02 , . . . so that the 
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error resulting from applying the optimal sequential test to the sequence of statistics C/i , C/2 , • • • is minimized 
over some space <I> of quantization rules. For a given quantizer 0^ we use 

fl^iu) := F,{cPn{Xn)=u), for i = 0, 1, 

to denote the distributions of the compressed data, conditioned on the hypothesis. In general, when random- 
ized quantizers are allowed, the vector {f^^{-), f^^{-)) ranges over a convex set, denoted conv$, whose 
extreme points correspond to deterministic quantizers based on likelihood ratio threshold rules ifTSll . 

We say that a quantizer design is stationary if the rule is independent of n; in this case, we simplify the 
notation to and In addition, we define the KL divergences := D{f^\\f^) and L»9 := D{f^\\fl). 
Moreover, let and denote the analogues of the functions J in Eq. ^ and J* in Eq. Q, respectively, 
defined using Z)^, for i = 0, 1. In this scenario, the sequence of compressed data Ui, . . . , Un, ... are drawn 
i.i.d. from either or Thus we can use the approximation (ITOl ) to characterize the asymptotically 
optimal stationary quantizer design. This is stated formally in the lemma to follow. 

We begin by stating the assumptions underlying the lemma. For a given class of quantizers we assume 
that the Kullback-Leibler divergences are uniformly bounded away from zero 

> O,D{0fl) > for all G CD (11) 

and moreover that the variance of the log likelihood ratios are bounded 

supVar^ilog(/^//^) < 00, and sup Var^o log(4//^) < 00. (12) 

Lemma 1. (a) Under assumptions (1111) and (1121 ). the optimal stationary cost takes the form 

'/;=(^ + ^) clogc-i (l + r^) (13) 

where \r^\ = o(l) as c ^ 0. 

(b) T/'sup^g.j, max{log(/^//^), log(/^//^)} < M for some constant M, then (fT3l) holds with 
sup^g$ \r^\ = 0(1) as c^Q. 

Proof: (a) This part is immediate from a combination of Theorems 1 and 2 of Chernoff Q. 

(b) We begin by bounding the error in the approximation ([8]). By definition of the stopping time N, we 
have either (i) 6 < Ljv < 6 + Af or (ii) a — M < < a. By standard arguments due to Wald 124], it 
is simple to obtain e^a < I — P < e^'^^^a, or equivalently b < b{a, (3) = log < h + M. Similar 
reasoning for case (ii) yields a — M < a(a, (3) = log < a. Now, note that 

EoL^ = aEo[Liv|L7V >h\ + {l- a)Eo[Ljv|^iV < a]. 

Conditioning on the event L^r e [b,h + M], we have \Ln — b{a, < M. Similarly, conditioning on the 
event Ljv G [a - M, a] , we have \LN-b{a,(3)\ < M. This yields IEoLat - (-^(a, 1 - < M. Similar 
reasoning yields lEiL^r — D{1 — /3, a)| < M. Let J^{a, b) denote the approximation ([8]) of J<^. We obtain: 

|J^- J^(a,/3)| < 2cM. 

Note that the approximation error bound is independent of 0. Thus, it suffices to establish the asymptotic 
behavior (fT3l) for the quantity inf^^^ J^{a,[3), where the infimum is taken over pairs of realizable error 
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probabilities {a, f3). Moreover, we only need to consider the asymptotic regime a + /3 — > 0, since the error 
probabilities a and f3 vanish as c — > 0. It is simple to see that D{1 — f5,a) = log(l/a)(l + o(l)), and 
D{1 — a,f3) = log(l//3)(l + 0(1)). Hence, mia,i3 J(j,{a,l3) can be expressed as 

inf jvrOa + vr^ + cvr^^i^ + c.^MV^j (1 + o(l)). (14) 



This infimum, taken over all positive (a, (3), is achieved at a* = -^q^ and /3* = ^(^^^ . Plugging the quanti- 

ties a* and /3* into Eq. (fT4l) yields ( [T3] ). Note that the asymptotic quantity o(l) in ( fT3] ) is absolutely bounded 
by a* + /?* ^ uniformly for all quantizer (j), because and Z)^ are uniformly bounded away from zero 
due to the Lemma's assumption. 

It remains to show that error probabilities (a*, can be approximately realized by using a sufficiently 
large threshold 6 > and small threshold a < while incurring an approximation cost of order 0(c) 
uniformly for all (p. Indeed, let us choose thresholds a' and h' such that e'^^^'^^^^ /2 < a* < e~^' , and 
e" ~^^/2 < P* < e"' . Let a' and /?' be the corresponding errors associated with these two thresholds. As 
before, we also have a' G {e-^^'+^^^ /2,e-''') and G (e"'-^V2, e"'). Clearly, \a* - a'\ < e'^' {I - 
g-M/2) = 0{a*) = 0(c). Similarly, \P* - f3'\ = 0(c). By the mean value theorem, 

I log(l/a*) - log(l/a )l < \a* " a'\e^'+^^ < 2e^\l - e-^'^ /2) = 0(1). 

Similarly, log(l//3*) — log(l//3') = 0(1). Hence, the approximation of (a*,/3*) by the realizable (a',/3') 
incurs a cost at most 0(c). Furthermore, the constant in the asymptotic bound 0(c) is independent of 
quantizer 

□ 

Remarks: 

1. If <I> is a finite class of quantizers, or a convex hull of a finite class of quantizers, the assumption in part 
b of Lemma[T]holds. It also holds in the case of discrete distributions and continuous distributions with 
bounded support. However, it would be interesting to relax this assumption so as to cover distributions 
with unbounded support. 

2. The preceding approximation of the optimal cost essentially ignores the overshoot of the likeUhood 
ratio Ljv- While it is possible to analyze this overshoot to obtain a finer approximation (cf. |[Tn[T6l 
[TOl O), we see that this is not needed for our purpose. Lemma [T] shows that given a fixed prior 
(tt^, tt^), among all stationary quantizer designs in is optimal for sufficiently small c if and only 
if (f) minimizes what we shall call the sequential cost coefficient: 

c + 

3. As a consequence of Lemma |7] to be proved in the sequel, if we consider the class of all binary 
randomized quantizers, then sequential cost coefficient is a quasiconcave function with respect to 
(/^(.), f^{-))- (A function F is quasiconcave if and only if for any r], the level set {F{x) > 7?} is a 
convex set; see Boyd and Vandenberghe [4 ] for further background). The minimum of a quasiconcave 
function lies in the set of extreme points in its domain. For the set conv <I>, these extreme points can 
be realized by deterministic quantizers based on likelihood ratios ll20l . Consequently, we conclude 
that for quantizers with binary outputs, the optimal cost is not decreased by considering randomized 
quantizers. We conjecture that this statement also holds beyond the binary case. 
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Section [5] is devoted to a more detailed study of asymptotically optimal stationary quantizers. In the 
meantime, we turn to the question of whether stationary quantizers are optimal in either finite-sample or 
asymptotic settings. 

4 Suboptimality of stationary designs 

It was shown by Tsitsiklis |19] that optimal quantizers (pn take the form of threshold rules based on the 
likelihood ratio Veeravalli et al. 12211211 asked whether these rules can always be taken to 

be stationary, a conjecture that has remained open. In this section, we resolve this question with a negative 
answer in both the finite-sample and asymptotic settings. 

4.1 Suboptimality in exact setting 

We begin by providing a numerical counterexample for which stationary designs are suboptimal. Consider 
a problem in which X G X = {1,2,3} and the conditional distributions take the form 

fO(.r\ - 1999 1 1 nnH f^f^) - [i i il 

J \-^) — LlO 10000 lOOOOJ v-^/ ~ 13 3 3J ' 

Suppose that the prior probabilities are vr^ = ^ and vr*^ = j^, and that the cost for each sample is c = j^. 

If we restrict to binary quantizers (i.e., U = {0, 1}), by the symmetric roles of the output alphabets there 
are only three possible deterministic quantizers: 

1. Design A: = <^=^> X„ = 1. As a result, the corresponding distribution for Un is specified 
by/,°,M = [| i]and4(n) = [l §]. 

2. Design B: (pBiXn) = <^=^ Xn € {1,2}. The corresponding distribution for Un is given by 

3. Design C: (f>c{Xn) = <^=^> X„ G {1,3}. The corresponding distribution for Un is specified by 

fO ^ r 8001 1999 1 . fl / X _ r2 In 

Uc ^Toooo TooooJ ^^'^ Hc^^) - ^3 sJ- 

Now consider the three stationary strategies, each of which uses only one fixed design. A, B or C. For 
any given stationary quantization rule (f), we have a classical centralized sequential problem, for which the 
optimal cost (achieved by a sequential probability ratio test) can be computed using a dynamic -programming 
procedure 1251 [Tl. Accordingly, for each stationary strategy, we compute the optimal cost function J for 10^ 
points on the p-axis by performing 300 updates of Bellman's equation (cf. 111). In all cases, the difference 
in cost between the 299th and 300th updates is less than 10^^. Let J a, Jb and Jc denote the optimal cost 
function for sequential tests using all As, all B's, and all C's, respectively. When evaluated at vr^ = 0.08, 
these computations yield J a = 0.0567, Jb = 0.0532 and Jc = 0.08. 

Finally, we consider a non-stationary rule obtained by applying design A for only the first sample, and 
applying design B for the remaining samples. Again using Bellman's equation, we find that the cost for this 
design is 

J* = min{min{7r\ 1 - vr^}, c + Jb{P{H = l\ui = 0))P(ui = 0) + 
Jb{P{H = l\ui = l))P{ui = 1)} = 0.052767, 

which is better than any of the stationary strategies. 
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In this particular example, the cost J* of the non-stationary quantizer yields a slim improvement (0.0004) 
over the best stationary rule Jb- This slim margin is due in part to the choice of a small per-sample cost 
c = 0.01; however, larger values of c do not yield counterexample when using the particular distributions 
specified above. A more significant factor is that our non-stationary rule differs from the optimal stationary 
rule B only in its treatment of the first sample. This fact suggests that one might achieve better cost by 
alternating between using design A and design B on the odd and even samples, respectively. Our analysis 
of the asymptotic setting in the next section confirms this intuition. 

4.2 Asymptotic suboptimality for both deterministic and randomized quantizers 

We now prove that in a broad class of examples, there is a range of prior probabilities for which stationary 
quantizer designs are suboptimal. Our result stems from the following observation: Lemma [T] implies that 
in order to achieve a small cost we need to choose a quantizer for which the KL divergences := 
L>(/^||/^) and := D{f^\\f^) are both as large as possible. Due to the asymmetry of the KL divergence, 
however, these maxima are not necessarily achieved by a single quantizer (p. This suggests that one could 
improve upon stationary designs by applying different quantizers to different samples, as the following 
lemma shows. 

Lemma 2. Let (pi and (p2 be any two quantizers. If the following inequalities hold 

Dl<DlandDl>Dl^ (15) 
then there exists a non-empty interval (U, V) C (Q, +cxd) such that as c ^ 0, 

T* < T* < T* 

7* > 7* > 7* 

where denotes the optimal cost of a sequential test that alternates between using (pi and (p2 on odd 

and even samples respectively. 

Proof: According to Lemma [H we have 

4 = (;^ + ;^)ciogc-Hi + o(i)), i = o,L (16) 

^ 4>i 4>i'' 

Now consider the sequential test that applies quantizers (pi and (p2 alternately to odd and even samples. 
Furthermore, let this test consider two samples at a time. Let f 9 , and f] , denote the induced conditional 

' r •'0102 •'0102 

probability distributions, jointly on the odd-even pairs of quantized variables. From the additivity of the KL 
divergence and assumption ([T5] ). there holds: 

^(/0°i02ll402) = ^°i+^°2>2^°i (17a) 
^(4i02ll/0i02) = K.+Dl,<2Dl^. (17b) 



if —<U 
if -^{U,V) 



if 



> K 
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Clearly, the cost of the proposed sequential test is an upper bound for J^^ . Furthermore, the gap between 
this upper bound and the true optimal cost is no more than 0(c). Hence, as in the proof of Lemma [H as 
c — > 0, the optimal cost J^^ can be written as 



27r0 

+ 



L>o +D0 D 



1 ^ ni clog c-^(l + 0(1)). (18) 



From equations ([T6l ) and (ITSl) . simple calculations yield the claim with 

□ 

Example: Let us return to the example provided in the previous subsection. Note that the two quantizers (pA 
and satisfy assumption GUI, since D{f^J\flJ = 0.4045 < D{flJ\flJ = 0.45 and Z)(4J|/^J = 
2.4337 > D{f^^\\f^^) = 0.5108. Furthermore, both quantizers dominates (/)c in terms of KL divergences: 
D{f^J\fl^) = 0.0438, D{f^J\fl^) = 0.0488. As a result, there exist a range of priors for which a 
sequential test using stationary quantizer design (either (j)A, ipB ot^ fpc for all samples) is not optimal. 

Theorem 3. (a) Suppose that <I> is a finite collection of quantizers, and that there is no single quantizer (j) 
that dominates all other quantizers in ^ in the sense that 

Dl > D^, and > for all € ^>. (20) 

Then there exists a non-empty range of prior probabilities for which no stationary design based on a quan- 
tizer in ^ is optimal. 

(b) For any non-deterministic (j) in the randomized class conv $, there exists a non-stationary quantizer 
design that has strictly smaller sequential cost coefficient than that of a stationary design based on (p for 
any choice of prior probabilities. 

Proof, (a) Since there are a finite number of quantizers in <I> and no quantizer dominates all others, the 
interval (0, oo) is divided into at least two adjacent non-empty intervals, each of which corresponds to a 
range of prior probability ratios vr'^/vr^ for which a quantizer is strictly optimal (asymptotically) among all 
stationary designs. Let them be {5i,5) and {5, 62), for two quantizers, namely, 4>i and (/)2- In particular, 5 is 
the value for vr^'/vr^ for which the sequential cost coefficients are equal — viz. G^^ = G^^ — which happens 
only if assumption ([T5] ) holds. Some calculations verify that 

D\ D\ (DO -DlY ^ ' 

01 4>2^ 4>i 92' 

By LemmaO a non-stationary design obtained by alternating between (/>! and (f)2 has smaller sequential cost 
than both and (/)2 for vrO/vr^ E ([/, V), where U and V are given in equation ( [T9l ). Since it can be verified 
that 5 as defined (|2TI) belongs to the interval ([/, V), we conclude that for vrO/vr^ G ([/, V) n {61,82), this 
non-stationary design has smaller cost than any stationary design using G <1>. 

(b) Let (j) e conv<I> be a randomized quantizer (i.e., at each step choose with non-zero probabilities 
wi,...,Wk from quantizers (/>!,..., 0^ G respectively, where J2i=i'^i — Clearly, the density 
induced by (/> satisfy: = Yli=i '^1/4, ^^'^ /J, = X]i=i '^ifl ■ strict convexity of the KL divergence 
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functional with respect jointly to the two density arguments (9), by Jensen's inequality we have: < 

Yli=i "^i-^^i ^"^^ ^\ < Si=i Since D^. and D^, are bounded from above uniformly for all 0j € 

it is possible to approximate {w\, . . . ,Wk) by rational numbers of the form {qi/N,q2/N, . . . , qk/N) for 
some natural numbers qi, . . . ,qk and satisfying i^^^i qi = N such that 

i=l 
k 

i=l 

Now consider the non-stationary quantizer that applies for qi steps, then 02 for q2 steps and so on, up 
to for gffc steps, yielding a total of N steps, and then repeats this sequence starting again at step + 1. 
By construction, this non-stationary quantizer has a smaller cost than that of quantizer 4> for any choice of 
prior. □ 

Remarks: (i) It is worth emphasizing the assumption that the class $ is finite is crucial in part a) of the the- 
orem. We do not know if this result can be extended to the case in which <^ is infinite, (ii) Part b) shows that 
any stationary randomized quantizer is always dominated by some non-stationary one. Actually, a stronger 
result can be proved at least for binary quantizers (see Cor. [8]): for any given choice of prior probability, any 
stationary randomized quantizer is dominated by a stationary deterministic quantizer, (iii) It is interesting 
to contrast the Bayesian formulation of the problem of quantizer design with the Neyman-Pearson formu- 
lation. Our results on the suboptimality of stationary quantizer design in the Bayesian formulation repose 
on the asymmetry of the KuUback-Leibler divergence, as well as the sensitivity of the optimal quantizers 
on the prior probability. We note that Mei |[T2ll (see p. 58) considered the Neyman-Pearson formulation 
of this problem. In this formulation, it can be shown that for all sequential tests for which the Type 1 and 
Type 2 errors are bounded by a and (3, respectively, then as a + /3 ^ 0, the expected stopping time EqA^ 
under hypothesis = is asymptotically minimized by applying a stationary quantizer (f)* that maxi- 
mizes D{f^\\f^). Similarly, the expected stopping time EiA'^ under hypothesis H = 1 h asymptotically 
minimized by the stationary quantizer cj)** that maximizes D{f^\\f^) ifTll . In this context, the example in 
subsection 14. II provides a case in which the asymptotically minimal KL divergences 0* and (p** are not the 
same, due to the asymmetry, which suggests that there may not exist a stationary quantizer that simultane- 
ously minimizes both EiA'^ and EqA''. 

4.3 Asymptotic suboptimality in multiple sensor setting 

Our analysis thus far has established that with a single sensor per time step {d = 1), applying multiple 
quantizers to different samples can reduce the sequential cost. As pointed out by one of the referees, it is 
natural to ask whether the same phenomenon persists in the case of multiple sensors {d > 1). In this section, 
we show that the phenomenon does indeed carry over, more specifically by providing an example in which 
stationary strategies are still sub-optimal in comparison to non-stationary ones. The key insight is that we 
have only a fixed number of dimensions, whereas as c ^ we are allowed to take more samples, and each 
sample can act as an extra dimension, providing more flexibility for non-stationary strategies. 

Suppose that the observation vector Xn at time n is d-dimensional, with each component corresponding 
to a sensor in a typical decentralized setting. Suppose that the observations from each sensor are assumed 
to be independent and identically distributed according to the conditional distributions defined in our earlier 
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example (see Section l4~T]) . Of interest are the optimal deterministic binary quantizer designs for all d sensors. 
Although there are three possible choices (pA, (pB and 4>c for each sensor, the quantizer (pc is dominated by 
the other two, so each sensor should choose either (pA and (pB- Suppose that among these sensors, a subset 
of size k choose cpA and whereas the remaining d — k sensors choose (ps for < k < d. We thus have 
d + I possible stationary designs to consider. For each k, the sequential cost coefficient corresponding to 
the associated stationary design takes the form 

kDl^ + id- k)Dl + kDl^ + {d- k)Dl^ ■ ^^^^ 

Now consider the following non-stationary design: the first sensor alternates between decision rules (pA 
and (pB, while the remaining d — l sensors simply apply the stationary design based on i;^^. For this design, 
the associated sequential cost coefficient is given by 

27r° 2tt^ 
^ ■= Dl + i2d-l)Dl + Dl^ + i2d-l)Dl^- ^^'^ 

Consider the interval {U,V), where the interval has endpoints 

- Dl Dl + {2d - l)Dl Dl Dl - D\ + (2d - 1)L>° D° + id - 

TJ _ <Pb <PA 4>A ^ ' 4>B <PB ^ Y _ 4>B VA VA ^ ' VB IPA ^ ' <PB 

~ -Dl Dl + (2d - l)Dl Dl ~ Dl -Dl Dl + (2d - l)Dl Dl + (d - l)Dl 

4>A <PB 4>A ^ > 4>B 4>B 4>A 4>B VA ^ ' VB VA ^ > 4>B 

Straightforward calculations yield that for any prior likelihood vr'^/vr^ G {U,V), the minimal cost over 
stationary designs minfe=o,...,d is strictly larger than the sequential cost G of the non-stationary design, 
previously defined in equation (1231 ). 



5 On asymptotically optimal blockwise stationary designs 

Despite the possible loss in optimality, it is useful to consider some form of stationarity in order to re- 
duce computational complexity of the optimization and decision process. In this section, we consider 
the class of blockwise stationary designs, meaning that there exists some natural number T such that 
4>T+i = 01 1 (pT+2 = <p2, and so on. For each T, let Ct denote the class of all blockwise stationary de- 
signs with period T. We assume throughout the analysis that each decision rule 0„ (n = 1, . . . , T) satisfies 
conditions (fTTl) and (fT2l) . Thus, as T increases, we have a hierarchy of increasingly rich quantizer classes 
that will be seen to yield progressively better approximations to the optimal solution. 

For a fixed prior (vr*^, vr^) and T > 0, let {cpi, . . . , cpx) denote a quantizer design in Ct- As before, the 
cost of an asymptotically optimal sequential test using this quantizer design is of order clog with the 
sequential cost coefficient 

- Dl + ... + DI + Dl +...+DI ■ ^ ^ 

tpi ' (pT 01 <pT 

G(j) is a function of the vector of probabilities introduced by the quantizer: {f^{-), f^{-))- We are interested 
in the properties of a quantization rule (p that minimizes J^. 

It is well known that there exist optimal quantizers — when unrestricted — that can be expressed as 
threshold rules based on the log likelihood ratio (LLR) [19,1 . Our counterexamples in the previous sections 
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imply that the thresholds need not be stationary (i.e., the threshold may differ from sample to sample). In the 
remainder of this section, we addresses a partial converse to this issue: specifically, if we restrict ourselves 
to stationary (or blockwise stationary) quantizer designs, then there exists an optimal design consisting of 
LLR-based threshold rules. 

It turns out that the analysis for the case T > 1 can be reduced to an analysis that is closely related to our 
earlier analysis for T = 1. Indeed, consider the sequential cost coefficient for the time step n = 1, where 
the rules for the other time steps are held fixed. From (l24l) we have 

for non-negative constants sq and si. As we will show, our earlier analysis of the sequential cost coefficient, 
in which sq = si = 0, carries through to the case in which these values are non-zero. This allows us to 
provide (in Theorem |9ll a characterization of the optimal blockwise stationary quantizer. 

Definition 4. The quantizer design function cj) : X —^U is said to be a likelihood ratio threshold rule if there 
are thresholds do = — oo < di < . . . < dx = +oo, and a permutation (ui, . . . , uk) o/ (0, 1, . . . , i^' — 1) 
such that for I = 1, . . . , K, with TQ-probability 1, we have: 

cPiX) = ui if di.i < f\X)/f\X) < du 

When = di-i, set 0(X) = n,_i or (piX) = m with Fo-probability lE 

Previous work on the extremal properties of likelihood ratio based quantizers guarantees that the Kullback- 
Leibler divergence is maximized by a LLR-based quantizer [18 |. In our case, however, the sequential cost 
coefficient involves a pair of KL divergences, and D^, which are related to one another in a non- 
trivial manner. Hence, establishing asymptotic optimality of LLR-based rules for this cost function does not 
follow from existing results, but rather requires further understanding of the interplay between these two KL 
divergences. 

The following lemma concerns certain "unnormalized" variants of the KuUback-Leibler (KL) diver- 
gence. Given vectors a = {uq, ai) and b = {bo,bi), we define functions and mapping from to 
the real line as follows: 

D^{a,b) := aolog- + &olog^ (25a) 
ai bi 

D\a,b) := ailog— + 5ilog^. (25b) 
ao Oo 

These functions are related to the standard (normalized) KL divergence via the relations D^{a, 1 — a) = 
D{ao, ai), and D^{a, 1 — a) = D{ai,ao). 

Lemma 5. For any positive scalars ai,bi,ci,ao,bQ,CQ such that ^ < ^ < at least one of the two 
following conditions must hold: 

D°{a,b + c) > D^{b,c + a) and D^{a,b + c) > D°{b,c + a), or (26a) 
D'^{c,a + b) > D^{b,c + a) and D^{c,a + b) > D^{b,c + a). (26b) 

^This last requirement of the definition is termed the canonical likelihood ratio quantizer by Tsitsiklis 1 18 1. Although one could 
consider performing additional randomization when there are ties, our later results (in particular, Lemma|7ll establish that in this 
case, randomization will not further decrease the optimal cost . 
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This lemma implies that under certain conditions on the ordering of the probability ratios, one can 
increase both KL divergences by re-quantizing. This insight is used in the following lemma to establish 
that the optimal quantizer (j) behaves almost like a likelihood ratio rule. To state the result, recall that the 
essential supremum is the infimum of the set of all -q such that f{x) < rj for Po-ahnost all x in the domain, 
for a measurable function /. 

Lemma 6.1f4> is an asymptotically optimal quantizer, then for all pairs (ui, ^2) G U, ui ^ U2, there holds: 

^ ess mr , ess sup 



/0(ni) V x:^ix)=U2 P{x)' x:cl>(x)=U2p{x) 

Note that a likelihood ratio rule guarantees something stronger: For Po-almost all x such that = ui, 
f^{x)/ f^{x) takes a value either to the left or to the right, but not to both sides, of the interval specified 
above. 

Lemma I2] stated below essentially guarantees quasiconcavity of for the case of binary quantizers. To 
state the result, let F : [0, 1]^ — > i? be given by 

/^(ao, ai) + do D[ai,ao) + di 



Lemma 7. For any non-negative constants cq, ci, di, the function F defined in (1271) is quasiconcave. 

We provide a proof of this result in the Appendix. An immediate consequence of Lemma |7] is that 
LLR-based quantizers exist for the class of randomized quantizers with binary outputs. 

Corollary 8. Restricting to the class of (blockwise) stationary binary quantizers, there exists an asymptoti- 
cally optimal quantizer (j) that is a (deterministic) likelihood ratio threshold rule. 

Proof: Let is a (randomized) binary quantizer. The sequential cost coefficient can be written as = 
-^(/^(O); /^(O))- The set of {(/^(O), /^(O)} for all is a convex set whose extreme points can be realized 
by deterministic likelihood ratio threshold rules (Prop. 3.2 of I1I8 J ). Since the minimum of a quasiconcave 
function must lie at one such extreme point [4], the corollary is immediate as a consequence of Lemma |7] 

□ 

It turns out that the same statement can also be proved for deterministic quantizers with arbitrary output 
alphabets: 

Theorem 9. Restricting to the class of (blockwise) stationary and deterministic decision rules, then there 
exists an asymptotically optimal quantizer (j) that is a likelihood ratio threshold rule. 

We present the full proof of this theorem in the Appendix. The proof exploits both Lemma |6] and 
Lemma |7] 



6 Discussion 

In this paper, we have studied the problem of sequential decentralized detection. For quantizers with neither 
local memory nor feedback (Case A in the taxonomy of Veeravalli et al. [22]), we have established that 
stationary designs need not be optimal in general. Moreover, we have shown that in the asymptotic setting 
(i.e., when the cost per sample goes to zero), there is a class of problems for which there exists a range of 
prior probabilities over which stationary strategies are suboptimal. 
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ai 
1-h 



A 



{h - bo)/bi I - bo ao 
Figure 1: Illustration of the domain A. 

There are a number of open questions raised by the analysis in this paper. First, our analysis has es- 
tablished only that the best stationary rule chosen from a finite set of deterministic quantizers need not be 
optimal. Is there a corresponding example with an infinite number of deterministic stationary quantizer de- 
signs for which none is optimal? Second, Corollary [8] establishes the optimality of likelihood ratio rules for 
randomized decision rules that produce binary outputs. This proof was based on the quasiconcavity of the 
function that specifies the asymptotic sequential cost coefficient. Is this function also quasiconcave 
for quantizers other than binary ones? Such quasiconcavity would extend the validity of Theorem |9] for the 
general class of randomized quantizers. 
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Proof of Lemma m 

By renormalizing, we can assume w.l.o.g. that oi + 6i + ci = ao + 6o + cq = 1. Also w.l.o.g, assume 
that bi > bo. Thus, ci > cq and ai < ag. Replacing ci = 1 — oi — fei and cq = 1 — ao — &o> the inequality 
ci/cq > 61/60 is equivalent to ai < aobi/bo — (61 — 6o)/6o. 

We fix values of 6, and consider varying a € A, where A denotes the domain for (ao, ai) governed by 
the following equality and inequahty constraints: < ai < 1 — 61; < ao < 1 — 60; oi < ao and 

ai < ao6i/6o - (61 - 6o)/6o. (28) 

Note that the third constraint (ai < ao) is redundant due to the other three constraints. In particular, 
constraint (1281 ) corresponds to a line passing through ((61 — 6o)/6i, 0) and (1 — 60, 1 — 61) in the (oq, ai) 
coordinates. As a result, A is the interior of the triangle defined by this line and two other hnes given by 
ai = and ao = 1 — 60 (see Figure [T]). 

Since both D^{a, 1 — a) and D^{a, 1 — a) correspond to KL divergences, they are convex functions with 
respect to (ao, ai). In addition, the derivatives with respect to ai are ^ < and log < 0, 

respectively. Hence, both functions can be (strictly) bounded from below by increasing ai while keeping ao 
unchanged, i.e., by replacing ai by a'^ so that (ao, a'^) lies on the line given by (1281 ). which is equivalent to 
the constraint ci/co = 61/60. Let c'^^ = 1 — 61 — a[; then c'^/co = 61/60. Our argument has estabhshed 
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inequalities (a) and (b) in the following chain of inequalities: 

D\a,b + c) > a;iog^ + (6i + c;)logJi^ (29a) 

«0 Oq + Co 

(A) a;iog^ + c;iog^ + 6ilog^ (29b) 
«o Co bo 

> (a'i + c'i)log^i±^ + 6ilog^ (29c) 
ao + Co oo 

= D^{a + c,b), (29d) 

inequality (c) follows from an application of the log-sum inequality 111. A similar conclusion holds for 

i)°(a,6 + c). 

Proof of Lemma |6] 

Suppose the opposite is true, that there exist two sets Si, S2 with positive Po-measure such that (piX) = 
U2 for any X e SiU S2, and 

fHsi) ^ ^ m) .... 

/o(Si) ^ /OK) ^ f^is^y ^ ' 

By reassigning or S'2 to the quantile ui, we are guaranteed to have a new quantizer 0' such that D^, > 
D^:t and D^, > D^, , thanks to Lemma [S] As a result, (j)' has a smaller sequential cost Jp , which is a 
contradiction. 

Proof of Lemma |7] 

The proof of this lemma is conceptually straightforward, but the algebra is involved. To simplify the 
notation, we replace oq by x, ai by y, the function L'(ao,ai) by f{x,y), and the function D{ai,aQ) by 
g{x, y). Finally, we assume that = di = 0; the proof will reveal that this case is sufficient to establish 
the more general result with arbitrary non-negative scalars do and di . 

We have f{x,y) = x\og{x/y) + (1 - x)log[(l - x)/(l - y)] and g{x,y) = y\og{y/x) + (1 - 
y) log[(l — y)/(l — x)]. Note that both / and g are convex functions and are non-negative in their domains, 
and moreover that we have F{x, y) = co/f{x, y) + ci/g{x, y). In order to establish the quasiconcavity of 
F, it suffices to show that for any (x, y) in the domain of F, for any vector h = [h^ hi\ ^ such that 
h'^VF{x, y) = 0, there holds 

h'^V^F{x,y) h <0 (31) 

(see Boyd and Vandenberghe [4J). Here we adopt the standard notation of VF for the gradient vector of F, 
and V^F for its Hessian matrix. We also use Fx to denote the partial derivative with respect to variable x, 
Fxy to denote the partial derivative with respect to x and y, and so on. 

We have VF = — J-^ — ^i^. Thus, it suffices to prove relation ( [3T| ) for vectors of the form 



h 



cofy _ £i9y 1 / cq/x _|_ cigi 

'7^ "3^; 17^ "^"9^ 



It is convenient to write h = cqVq + civi, where vq = [-fy/P fx/Pf and vi = [-gy/g"^ Qx/g^V- 
The Hessian matrix V^F can be written as V^F = co^^o + cqHi, where 



fxxf 2/^ fxyf '^fxfy 
xyf ~ '^fxfy fyyf ~ '^fy 
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and 



Hi = TT 



QxxQ - 'igi QxyQ - 
9xy9 - '^gx9y 9yy9 



'^gxgy 



{cqVq + CiVif{coHQ + CiHi){cqVq + CiVi), 



Now observe that 
which can be simphfied to 



This function is a polynomial in cq and ci, which are restricted to be non-negative scalars (at least one of 
which is assumed to be non-zero). Therefore, it suffices to prove that all the coefficients of this polynomial 
(with respect to cq and ci) are strictly negative. In particular, we shall show that 

(i) VqHqVq < 0, and 

(ii) 2v^HoVi + v^HiVo < 0,. 

where in both cases equality occurs only if x = y, which is outside of the domain of F. The strict negativity 
of the other two coefficients follows from entirely analogous arguments. 

First, some straightforward algebra shows that inequality (i) is equivalent to the relation 

fxxfy + fyyfx — '^fxfyfxy 

But note that / is a convex function, so fxxfyy > fxy Hence, we have 

2 ^ (a) (fe) 

fxxfy fyyfx — fxxfyy\fxfy\ ^ "^fxfyfxyi 

thereby proving (i). (In this argument, inequality (a) follows from the fact that + P > 2ab, whereas 
inequality (b) follows from the strict convexity of /. Equality occurs only if x = y.) 
Regarding (ii), some further algebra reduces it to the inequality 



where 



G2 



Gi + G2 - Ga > 0, 

^{fy9yfxx + fx9xfyy ~ {fy9x + fx9y) fxy) ■> 
fy9xx + fx9yy ~ ^fxfy9xy: 



(32) 



G3 = -(fygx-fxgy)"^- 

At this point in the proof, we need to exploit specific information about the functions / and g, which 
are defined in terms of KL divergences. To simplify notation, we let u = x/y and v = {I — x)/{l — y). 
Computing derivatives, we have 



fx{x,y) 
fy{x,y) 
9x{x,y) 
9y{x,y) 



log(x/y) - log((l - x)/{l - y)) = log{u/v), 
(1 - x)/ {l-y)-x/y = v- u, 
(1 - y)/(l -x) - y/x = 1/v - 1/u, 
log(j//x) - log((l - y)/(l - a;)) = log(t;/u), 



1 

x{l—x) 
x{l—x) 



1 



, y(i-y) 

l — X I X 



(1-2/) 



and V^5((a;, y) 



1-?/ . 



1 

xiX—x) 



x(\—x) 
!/(l-J/) 
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Noting that = —gy] gxy = —fxx] fxy = —9yy^ we see that equation (I32b is equivalent to 

2 

'^ifxgxfyy + fygxQyy) - fx9yy + fydxx > -{fygx - fxgyf ■ (33) 

To simplify the algebra further, we shall make use of the inequality (logt^)^ ^ ~ 1/0^' which is vahd 
for any t. This implies that 

fygx = {v- u){l/v - l/u) < fxgy = -{log{u/v)f = -fl = -gl < 0. 

Thus, -flgyy > fygxgyy, and ^{fygx - fxgyf < pygxifygx - fxgy)- As a result, ^ would follow if 
we can show that 

"^{fxgxfyy + fygxgyy) + fygxgyy + fygxx ^ ~fygx{fygx ~ fxgy)- 

For all X y,'we may divide both sides by —fy{x, y)gx{x, y) > 0. Consequently, it suffices to show that: 

2 

'^fxfyy/ fy fygxx/gx '^9yy — ~(.fx9y gxfy)i 



or, equivalently, 

/^'^^ I 111 I / ^1 

u — 1 1 — vJ \1 — X xj y{l — y) g\ uv v 



21og(u/z;) + + + - - -r, r > - ^ - (log 



or, equivalently. 



, , . (u — v)(u + V — 1) (u — v)'^(u + v — 4uv) 2/(u — v)'^ u... . 

2log{u v) ^ '] -^ + ^ r^W^ V^^-\- ^- log-' . (34) 

[u—l){l — v) uv[u — l)[l — v) g\ uv V ' 



Due to the symmetry, it suffices to prove (134] ) for x < y. In particular, we shall use the following inequality 
for logarithm mean [13J> which holds for u v: 

3 ^ log u — log w ^ 1 



2^/m! + {u + v)/2 u-v (nu(u + t;)/2)i/3" 

We shall replace '"sC^/") (I34I) by appropriate upper and lower bounds. In addition, we shall also bound 
g{x, y) from below, using the following argument. When a; < y, we have u < \ < v, and 

/ N , y,n M 3y(y-x) (l-y)(x-y) 

9x,y = ylog- + 1 -y)log- > ——— — ■ — 7777 + 7-; ttt ttt z — ■ — ttt^ttTTs 

X l-x 2./xy + {x + y)/2 [(1 - - - (x + y)/2)] 



(n-i;)(2V^+ (u-f)(f(t; + l)/2)V3 

Let us denote this lower bound by q{u, v). 

Having got rid of the logarithm terms, (l34l ) will hold if we can prove the following: 



6{u — v)"^ {u + V — I) ^{u — v)^ {u + V — Auv) ^ 2 f {u — v)^ 9{u — v)'^ 



{2^+{u + v)/2){u-l){l-v) uv{u-l){l-v) - q{u,v)\ uv {2^ + {u + v)/2)'^ J ' 
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or equivalently, 



/ 6{u + v-l) {u + v-4uv)\f 3 1 

\{2^+{u + v)/2) ^ ^ ) \{v-u){2^+ii^) ~ iv-u){v{v + l)/2)y3 

>2(1- ^ (35) 

- \uv {2^+{u + v)/2)y' ^ ' 

which is equivalent to 

{u + v- 2^){{u + v)/2 + Zy/wv + Auv) Z{v{v + l)/2) ^3 _ (201 +{u + l) /2) 
{2^+{u + v)/2)uv {v - u){2^+ {u + l)/2){v{v + l)/2)i/3 

^ {u + v- 2^){{u + v)/2 + 
uv{2y/^+{u + v)/2Y 

and also equivalent to 

((u + v)/2 + 2^){{u + u)/2 + + 4n?j)[3(t;(t; + l)/2fl^ - {2^+ {u + l)/2)] 

> (2^/^ + (-u + l)/2)(i;(w + l)/2)^/^{{u + v)/2 + b^){v - u). (37) 

It can be checked by tedious but straightforward calculus that inequality (|37] ) holds for any u < 1 < v, 
and equality holds when u = 1 = v, i.e., x = y. 

Proof of Theorem |9] 

Suppose that cj) is not a likelihood ratio rule. Then there exist positive Pq -probability disjoint sets 
5i, 52, 53 such that for any Xi e Si,X2 £ S2, X3 e 53, 

</.(Xi) = ^{Xs) = ui (38a) 
ct>{X2) = U2^ui (38b) 

m)^m)</!™ (38c) 

Define the probability of the quantiles as: 

/0(ni) := Po(<A(^) = ^^i), and f{u2) := Po(</>(^) = ^2), 
f\ui) := Pi((/>(X) = m), and f\u2) := Pi(c/>(X) = U2). 

Similarly, for the sets 5i, ^2 and ^3, we define 

«o = /°(Si), 6o = /°(52) and cq = /°(53), 
ai = /H5i), bi = f\S2), and ci = /H53). 

Finally, let po,Pi, qo and qi denote the probability measures of the "residuals": 

Po = f{u2)-bo, pi = f{u2)-bi, 

qo = - flo - Co, qi = /^(ui) - ai - ci. 
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Note that we have ^ < ^ < ^. In addition, the sets 5i and 5*3 were chosen so that 3± < 3i <: £i_ 

ao bo CO ' i J ao — go — cq 

From Lemma |6j there holds = jo[„^) ^ ^f^' c^^ • assume without loss of generality that 

£i±^ < £1 Then, ^^^4t^ < r', so — < ^^"!"?^ . Overall, we are guaranteed to have the ordering 

PO+Oo — a.0 ' po+bo bo ' po Po+oo ' ° 



Po Po + bo ao bo cq 

Our strategy will be to modify the quantizer (p only for those X for which </>(X) takes the values ui or 
U2, such that the resulting quantizer is defined by a LLR-based threshold, and has a smaller (or equal) value 
of the corresponding cost J^. For simplicity in notation, we use A to denote the set with measures under 
Pq and Pi equal to ao and ai, the sets B, C, V and Q are defined in an analogous manner. We begin by 
observing that we have either — < 2i±2± < ^ or ^ < 2i±£i < £i xhus, in our subsequent manipulation 
of sets, we always bundle Q with either ,A or C accordingly without changing the ordering of the probability 
ratios. Without loss of generality, then, we may disregard the corresponding residual set corresponding to 
Q in the analysis to follow. 

In the remainder of the proof, we shall show that either one of the following two modifications of the 
quantizer will improve (decrease) the sequential cost 

(i) Assign A, B and C to the same quantization level ui, and leave V to the level U2, or 

(ii) Assign V, A and B to the same level U2, and leave c to the level ui. 

It is clear that this modified quantizer design respects the likelihood ratio rule for the quantization indices 
ui and M2- By repeated application of this modification for every such pair, we are guaranteed to arrive at a 
likelihood ratio quantizer that is optimal, thereby completing the proof. 

Let a'p, 5q, Cg,Po be normalized versions of oq, ho, co,po> respectively (i.e., Cq = ao/(Po + + &o + co), 
and so on). Similarly, let a'^ , 6'i , c'^ , p'^ be normalized versions of ai,6i,ci,pi, respectively. With this 
notation, we have the relations 



Dd> = V f {u)\og——- + {po + bo)\og — - + (ao + co log ■ 

^ J^{u) Pi + oi ai + ci 

= Ao + ifim) + f{n2)) ({p'o + b',) log Pf±^ + (a[, + c',) log 

\ Pi ~r Oi 0.1 + C-j^ 

= ^0 + ifini) + f{u2))D\p' + 6', a' + c'), 

= > f {u)log-^ + {pi+bi)log — — - + (ai + ci log — — 

^ /"W Po + bo ao + CQ 

= Ai + if\ui) + f\u2))D\p' + b\a' + c'), 



where we define 



E w log ^ + (/H-i) + /^(n^)) log > 
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due to the non-negativity of the KL divergences. 
Note that from (l39l) we have 



A ^ v'l + A < ^ < 1 < 1^ 

Po + ^0 ~ «0 ^0 ' 
in addition to the normalization constraints that + ^^o + + Cq = "p'l + a'^ + 6'^ + c']^ = 1. It follows that 

Let us consider varying the values of a'^, h'^, while fixing all other variables and ensuring that all the 
above constraints hold. Then, a!^ + h'^ is constant, and both D^{p' + b', a' + c') and D^{p' + 6', a' + c') 
increase as b'l decreases and a'l increases. In other words, if we define Oq = Oq, b^ = 6q and a'/ and 6'/ such 
that 

< _ 6^ _ 1 - - c; 
a'o ^0 1 - - c'o ' 

then we have 

+ 6', a' + c') < + b", a" + c') and b^{p' + b' , a' + c') < ^^(p' + + c'). (40) 

Now note that vector (6q, 6'/) in is a convex combination of (0, 0) and (og + b^, a" + 6'/). It follows 
that(p^ + 6'o',p; + 6'/) is a convex combination of (pq,^^) and (pq + Oq + bQ,p[ + a'/ + 6'/) — {p'q + Oq + 
6'o,p'i + a; + 6;). 

By dlOl), we obtain: 

(^6 — — TTTT — ^ z:r. — , -„ , ■ — ■ + 



^0 + + P{u2))D^{p' + b', a' + c') Ai + (/i(ni) + Piu2))D\p' + b', a' + d) 

> 



Ao + + P{u2))D0{p' + b", a" + c') + (/i(ni) + p{u2))D^{p> + 6", a" + c') 



^0 + + Piu2))Dip', + + b'l) A, + (/i(^xi) + Piu2))Dip[ + b';,p', + 6^') 

Applying the quasiconcavity result in Lemma |7J 

Goi > mm<^ — -r- — -^777 TTTT^^ TT + 



Ao + {Piu,) + Piu2))D{p'^,p[) Ai + {p{ui) + Piu2))D{p[,p'^)' 





TT 



Ao + + /0(^/2))i^(p() + a'o + b'o,p[ + < + b[) 

7T^ 

Ai + + fHu2))D{p[ + a[ + b[,p', + a'o + b'^) 



But the two arguments of the minimum are the sequential cost coefficient corresponding to the two possible 
modifications of 0. Hence, the proof is complete. 
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